NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

When is Agnostic Reinforcement Learning Statistically Tractable?

Jia, Zeyu; Li, Gene; Rakhlin, Alexander; Sekhari, Ayush; Srebro, Nathan (March 2025, Advances in Neural Information Processing Systems 36)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Pi, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an epsilon-suboptimal policy with respect to Pi? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Pi and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class Pi. However, for online RL, the situation is more subtle. We show there exists a policy class Pi with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration.
more » « less
Free, publicly-accessible full text available March 30, 2026
Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

Wagenmaker, Andrew; Huang, Kevin; Ke, Liyiming; Jamieson, Kevin; Gupta, Abhishek (April 2025, Curran Associates, Inc.)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Free, publicly-accessible full text available April 1, 2026
Inexact Augmented Lagrangian Methods for Conic Optimization: Quadratic Growth and Linear Convergence

Liao, Feng-Yi; Ding, Lijun; Zheng, Yang (December 2024, Curran Associates, Inc.)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Augmented Lagrangian Methods (ALMs) are widely employed in solving constrained optimizations, and some efficient solvers are developed based on this framework. Under the quadratic growth assumption, it is known that the dual iterates and the Karush–Kuhn–Tucker (KKT) residuals of ALMs applied to conic programs converge linearly. In contrast, the convergence rate of the primal iterates has remained elusive. In this paper, we resolve this challenge by establishing new quadratic growth and error bound properties for primal and dual conic programs under the standard strict complementarity condition. Our main results reveal that both primal and dual iterates of the ALMs converge linearly contingent solely upon the assumption of strict complementarity and a bounded solution set. This finding provides a positive answer to an open question regarding the asymptotically linear convergence of the primal iterates of ALMs applied to conic optimization.
more » « less
Full Text Available
Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Tajdini, Artin; Jain, Lalit; Jamieson, Kevin (December 2024, Curran Associates, Inc.)
Globerson, A; Mackey, L; Belgrave, D; Fan, D; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Full Text Available
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Wang, Yiping; Chen, Yifang; Yan, Wendan; Fang, Alex; Zhou, Wenjing; Jamieson, Kevin; Du, Simon Shaolei (December 2024, Curran Associates, Inc.)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Full Text Available
Randomized Strategic Facility Location with Predictions

Balkanski, Eric; Gkatzelis, Vasilis; Shahkarami, Golnoosh (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
In the strategic facility location problem, a set of agents report their locations in a metric space and the goal is to use these reports to open a new facility, minimizing an aggregate distance measure from the agents to the facility. However, agents are strategic and may misreport their locations to influence the facility’s placement in their favor. The aim is to design truthful mechanisms, ensuring agents cannot gain by misreporting. This problem was recently revisited through the learning-augmented framework, aiming to move beyond worst-case analysis and design truthful mechanisms that are augmented with (machine-learned) predictions. The focus of this prior work was on mechanisms that are deterministic and augmented with a prediction regarding the optimal facility location. In this paper, we provide a deeper understanding of this problem by exploring the power of randomization as well as the impact of different types of predictions on the performance of truthful learning-augmented mechanisms. We study both the single-dimensional and the Euclidean case and provide upper and lower bounds regarding the achievable approximation of the optimal egalitarian social cost.
more » « less
Full Text Available
Belief-State Query Policies for User-Aligned POMDPs

Bramblett, Daniel; Srivastava, Siddharth (December 2024, 38th Conference on Neural Information Processing Systems)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Planning in real-world settings often entails addressing partial observability while aligning with users’ requirements. We present a novel framework for expressing users’ constraints and preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) policies in the setting of goal- oriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment. Analysis proves that our algorithms converge to the optimal user-aligned behavior in the limit. Empirical results show that parameterized BSQ policies provide a computationally feasible approach for user-aligned planning in partially observable settings.
more » « less
Full Text Available
Fair and Welfare Efficient Constrained Multi-Matchings Under Uncertainty

Lobo, Elita; Payan, Justin; Cousins, Cyrus; Zick, Yair (December 2024, Curran Associates, Inc.)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
We study fair allocation of constrained resources, where a market designer optimizes overall welfare while maintaining group fairness. In many large-scale settings, utilities are not known in advance, but are instead observed after realizing the allocation. We therefore estimate agent utilities using machine learning. Optimizing over estimates requires trading-off between mean utilities and their predictive variances. We discuss these trade-offs under two paradigms for preference modeling – in the stochastic optimization regime, the market designer has access to a probability distribution over utilities, and in the robust optimization regime they have access to an uncertainty set containing the true utilities with high probability. We discuss utilitarian and egalitarian welfare objectives, and we explore how to optimize for them under stochastic and robust paradigms. We demonstrate the efficacy of our approaches on three publicly available conference reviewer assignment datasets. The approaches presented enable scalable constrained resource allocation under uncertainty for many combinations of objectives and preference models.
more » « less
Full Text Available
Clustering with Non-adaptive Subset Queries

Black, H; Lee, E; Mazumdar, A; Saha, B (December 2024, NeurIPS Proceedings)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Full Text Available
Generalized Protein Pocket Generation with Prior-Informed Flow Matching

Zhang, Zaixi; Zitnik, Marinka; Qi, Liu (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Designing ligand-binding proteins, such as enzymes and biosensors, is essential in bioengineering and protein biology. One critical step in this process involves designing protein pockets, the protein interface binding with the ligand. Current approaches to pocket generation often suffer from time-intensive physical computations or template-based methods, as well as compromised generation quality due to the overlooking of domain knowledge. To tackle these challenges, we propose PocketFlow, a generative model that incorporates protein-ligand interaction priors based on flow matching. During training, PocketFlow learns to model key types of protein-ligand interactions, such as hydrogen bonds. In the sampling, PocketFlow leverages multi-granularity guidance (overall binding affinity and interaction geometry constraints) to facilitate generating high-affinity and valid pockets. Extensive experiments show that PocketFlow outperforms baselines on multiple benchmarks, e.g., achieving an average improvement of 1.29 in Vina Score and 0.05 in scRMSD. Moreover, modeling interactions make PocketFlow a generalized generative model across multiple ligand modalities, including small molecules, peptides, and RNA.
more » « less
Full Text Available

« Prev Next »

Search for: All records